AITopics

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.64)

Neural Information Processing SystemsFeb-19-2026, 12:00:35 GMT

ce9e92e3de2372a4b93353eb7f3dc0bd-Supplemental-Datasets_and_Benchmarks.pdf

crowdsourced data, dataset, pipeline, (14 more...)

Country:

Africa > Niger (0.07)
Europe > Germany > Saxony > Leipzig (0.04)
Asia > Vietnam (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Communications > Social Media > Crowdsourcing (0.31)

Neural Information Processing SystemsFeb-18-2026, 04:21:54 GMT

Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs Sukmin Y un

We hope our work will contribute to the development of general MLLMs suitable for web-based content generation and task automation.

large language model, machine learning, natural language, (20 more...)

Genre: Research Report (0.46)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Goldenits, Georg, Koenig, Philip, Raubitzek, Sebastian, Ekelhart, Andreas

Small Language Models for Phishing Website Detection: Cost, Performance, and Privacy Trade-Offs

arXiv.org Artificial IntelligenceNov-20-2025

Phishing websites pose a major cybersecurity threat, exploiting unsuspecting users and causing significant financial and organisational harm. Traditional machine learning approaches for phishing detection often require extensive feature engineering, continuous retraining, and costly infrastructure maintenance. At the same time, proprietary large language models (LLMs) have demonstrated strong performance in phishing-related classification tasks, but their operational costs and reliance on external providers limit their practical adoption in many business environments. This paper investigates the feasibility of small language models (SLMs) for detecting phishing websites using only their raw HTML code. A key advantage of these models is that they can be deployed on local infrastructure, providing organisations with greater control over data and operations. We systematically evaluate 15 commonly used Small Language Models (SLMs), ranging from 1 billion to 70 billion parameters, benchmarking their classification accuracy, computational requirements, and cost-efficiency. Our results highlight the trade-offs between detection performance and resource consumption, demonstrating that while SLMs underperform compared to state-of-the-art proprietary LLMs, they can still provide a viable and scalable alternative to external LLM services. By presenting a comparative analysis of costs and benefits, this work lays the foundation for future research on the adaptation, fine-tuning, and deployment of SLMs in phishing detection systems, aiming to balance security effectiveness and economic practicality.

large language model, machine learning, natural language, (20 more...)

2511.15434

Country:

Europe (1.00)
North America > United States (0.93)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.54)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-10-2025, 16:40:58 GMT

cb66be286795d71f89367d596bf78ea7-Paper-Datasets_and_Benchmarks_Track.pdf

dataset, instruction, webpage, (14 more...)

Country: Africa > South Africa > Gauteng > Johannesburg (0.04)

Genre: Research Report (0.46)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Neural Information Processing SystemsAug-19-2025, 00:51:29 GMT

Appendix

Figure 9: Example showing how a single line of HTML code is rendered by a browser's renderer.

artificial intelligence, natural language, social media, (17 more...)

Country:

Africa > Niger (0.07)
Europe > Germany > Saxony > Leipzig (0.04)
Asia > Vietnam (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Communications > Social Media > Crowdsourcing (0.31)

Neural Information Processing SystemsMay-27-2025, 16:34:43 GMT

Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

large-scale webpage-to-code dataset, web2code, webpage-to-code dataset and evaluation framework, (5 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Vu, Tung D., Hoang, Chung, Hy, Truong-Son

Multimodal graph representation learning for website generation based on visual sketch

arXiv.org Artificial IntelligenceApr-29-2025

The Design2Code problem, which involves converting digital designs into functional source code, is a significant challenge in software development due to its complexity and time-consuming nature. In this paper, we propose a novel method that leverages multimodal graph representation learning to address these challenges. By integrating both visual and structural information from design sketches, our approach enhances the accuracy and efficiency of code generation, particularly in producing semantically correct and structurally sound HTML code. We present a comprehensive evaluation of our method, demonstrating significant improvements in both accuracy and efficiency compared to existing techniques. Extensive evaluation demonstrates significant improvements of multimodal graph learning over existing techniques, highlighting the potential of our method to revolutionize design-to-code automation. The Design2Code problem, which involves converting UI designs into functional source code, is a pivotal challenge in software development that lies at the intersection of computer vision, natural language processing, and programming. This task is particularly demanding when generating HTML code from webpage designs, as it requires not only the interpretation of visual elements but also an understanding of their spatial arrangements and hierarchical relationships.

large language model, machine learning, natural language, (19 more...)

2504.18729

Country:

Asia (0.68)
North America > United States > Alabama (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

arXiv.org Artificial IntelligenceOct-23-2024

WAFFLE: Multi-Modal Model for Automated Front-End Development

Liang, Shanchao, Jiang, Nan, Qian, Shangshu, Tan, Lin

Web development involves turning UI designs into functional webpages, which can be difficult for both beginners and experienced developers due to the complexity of HTML's hierarchical structures and styles. While Large Language Models (LLMs) have shown promise in generating source code, two major challenges persist in UI-to-HTML code generation: (1) effectively representing HTML's hierarchical structure for LLMs, and (2) bridging the gap between the visual nature of UI designs and the text-based format of HTML code. To tackle these challenges, we introduce Waffle, a new fine-tuning strategy that uses a structure-aware attention mechanism to improve LLMs' understanding of HTML's structure and a contrastive fine-tuning approach to align LLMs' understanding of UI images and HTML code. Models fine-tuned with Waffle show up to 9.00 pp (percentage point) higher HTML match, 0.0982 higher CW-SSIM, 32.99 higher CLIP, and 27.12 pp higher LLEM on our new benchmark WebSight-Test and an existing benchmark Design2Code, outperforming current fine-tuning methods.

large language model, machine learning, natural language, (22 more...)

2410.18362

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > Dominican Republic (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Artificial IntelligenceSep-14-2024

IW-Bench: Evaluating Large Multimodal Models for Converting Image-to-Web

Guo, Hongcheng, Zhang, Wei, Chen, Junhao, Gu, Yaonan, Yang, Jian, Du, Junjia, Hui, Binyuan, Liu, Tianyu, Ma, Jianxin, Zhou, Chang, Li, Zhoujun

Recently advancements in large multimodal models have led to significant strides in image comprehension capabilities. Despite these advancements, there is a lack of the robust benchmark specifically for assessing the Image-to-Web conversion proficiency of these large models. Primarily, it is essential to ensure the integrity of the web elements generated. These elements comprise visible and invisible categories. Previous evaluation methods (e.g., BLEU) are notably susceptible to significant alterations due to the presence of invisible elements in Web. Furthermore, it is crucial to measure the layout information of web pages, referring to the positional relationships between elements, which is overlooked by previous work. To address challenges, we have curated and aligned a benchmark of images and corresponding web codes (IW-Bench). Specifically, we propose the Element Accuracy, which tests the completeness of the elements by parsing the Document Object Model (DOM) tree. Layout Accuracy is also proposed to analyze the positional relationships of elements by converting DOM tree into a common subsequence. Besides, we design a five-hop multimodal Chain-of-Thought Prompting for better performance, which contains five hop: 1) SoM prompt injection. 2) Inferring Elements. 3) Inferring Layout. 4) Inferring Web code. 5) Reflection. Our benchmark comprises 1200 pairs of images and web codes with varying levels of difficulty. We have conducted extensive experiments on existing large multimodal models, offering insights into their performance and areas for improvement in image-to-web domain.

large language model, machine learning, natural language, (20 more...)

2409.1898

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Singapore (0.04)
North America > Canada > Ontario > Toronto (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry: Education (0.93)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
(2 more...)